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Abstract 


Embedded distributed systems have become an integral 
part of safety-critical computing applications, necessitating system 
designs that incorporate fault tolerant clock synchronization in 
order to achieve ultra-reliable assurance levels. Many efficient 
clock synchronization protocols do not, however, address 
Byzantine failures, and most protocols that do tolerate Byzantine 
failures do not self-stabilize. Of the Byzantine self-stabilizing 
clock synchronization algorithms that exist in the literature, they 
are based on either unjustifiably strong assumptions about initial 
synchrony of the nodes or on the existence of a common pulse at 
the nodes. The Byzantine self-stabilizing clock synchronization 
protocol presented here does not rely on any assumptions about 
the initial state of the clocks. Furthermore, there is neither a 
centred clock nor an externally generated pulse system. The 
proposed protocol converges deterministically, is scalable, and 
self-stabilizes in a short amount of time. The convergence time is 
linear with respect to the self-stabilization period. Proofs of the 
correctness of the protocol as well as the results of formed 
verification efforts are reported. 
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1. Introduction 


Synchronization and coordination algorithms are part of distributed computer systems. 
Clock synchronization algorithms are essential for managing the use of resources and controlling 
communication in a distributed system. Also, a fundamental criterion in the design of a robust 
distributed system is to provide the capability of tolerating and potentially recovering from 
failures that are not predictable in advance. Overcoming such failures is most suitably addressed 
by tolerating Byzantine faults [Lamport 1982]. A Byzantine-fault model encompasses all 
unexpected failures, including transient ones, within the limitations of the maximum number of 
faults at a given time. Driscoll et al. [Driscoll 2003] addressed the frequency of occurrences of 
Byzantine faults in practice and the necessity to tolerate Byzantine faults in ultra-reliable 
distributed systems. A distributed system tolerating as many as F Byzantine faults requires a 
network size of more than 3 F nodes. Lamport et al. [Lamport 1982, Lamport 1985] were the 
first to present the problem and show that Byzantine agreement cannot be achieved for fewer 
than 3 F +1 nodes. Dolev et al. [Dolev 1984] proved that at least 3F + 1 nodes are necessary for 
clock synchronization in the presence of F Byzantine faults. 

A distributed system is defined to be self-stabilizing if, from an arbitrary state and in the 
presence of bounded number of Byzantine faults, it is guaranteed to reach a legitimate state in a 
finite amount of time and remain in a legitimate state as long as the number of Byzantine faults 
are within a specific bound. A legitimate state is a state where all good clocks in the system are 
synchronized within a given precision bound. 

Therefore, a self- stabilizing system is able to start in a random state and recover from 
transient failures after the faults dissipate. The concept of self-stabilizing distributed 
computation was first presented in a classic paper by Dijkstra [Dijkstra 1974]. In that paper, he 
speculated whether it would be possible for a set of machines to stabilize their collective 
behavior in spite of unknown initial conditions and distributed control. The idea was that the 
system should be able to converge to a legitimate state within a bounded amount of time, by 
itself, and without external intervention. 

This paper addresses the problem of synchronizing clocks in a distributed system in the 
presence of Byzantine faults. There are many algorithms that address permanent faults [Srikanth 
1985], where the issue of transient failures is either ignored or inadequately addressed. There are 
many efficient Byzantine clock synchronization algorithms that are based on assumptions on 
initial synchrony of the nodes [Srikanth 1985, Welch 1988] or existence of a common pulse at 
the nodes [Dolev 2004]. There are many clock synchronization algorithms that are based on 
randomization and, therefore, are non-deterministic [Dolev 2004]. Some clock synchronization 
algorithms have provisions for initialization and/or reintegration. However, solving these special 
cases is insufficient to make the algorithm self-stabilizing. A self-stabilizing algorithm 
encompasses these special scenarios without having to address them separately. The main 
challenges associated with self-stabilization are the complexity of the design and the proof of 
correctness of the protocol. Another difficulty is achieving efficient convergence time for the 
proposed self- stabilizing protocol. 
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Other recent developments in this area are the algorithms developed by Daliot et al 
[Daliot 2003 A and 2003B]. The algorithm in [Daliot 2003B] is called the Byzantine self- 
stabilization pulse synchronization (BSS-Pulse-Synch) protocol. A flaw in BSS-Pulse-Synch 
protocol was found and documented in [Malekpour 2006]. The biologically inspired Pulse 
Synchronization protocol in [Daliot 2003A] has claims of self- stabilization, but no mechanized 1 
proofs are provided. 

In this paper a rapid Byzantine self- stabilizing clock synchronization protocol is 
presented that self-stabilizes from any state, tolerates bursts of transient failures, and 
deterministically converges within a linear convergence time with respect to the self-stabilization 
period. Upon self- stabilization, all good clocks proceed synchronously. This protocol has been 
the subject of rigorous verification efforts that support the claim of correctness. 

The following sections describe the proposed protocol in detail. The report begins with 
the underlying topology and network model, followed by a description of the protocol. A proof 
of the protocol is presented in the following section. The protocol characteristics are then 
discussed. A summary of the simulation and model checking results is reported. Some of the 
potential applications are enumerated, followed by potential future work in this area. 


2. Topology 

The underlying topology considered here is a network of K nodes that communicate by 
exchanging messages through a set of communication channels. The communication channels 
are assumed to connect a set of source nodes to a set of destination nodes such that the source of 
a given message is distinctly identifiable from other sources of messages. This system of K 
nodes can tolerate a maximum of F Byzantine faulty nodes, where K > 3F +1. Therefore, the 
minimum number of good nodes in the system, G, is given by G = K-F and thus G > (2 F + 1) 
nodes. Let K G represent the set of good nodes. The nodes communicate with each other by 
exchanging broadcast messages. Broadcast of a message to all other nodes is realized by 
transmitting the message to all other nodes at the same time. The source of a message is 
assumed to be uniquely identifiable. The communication network does not guarantee any order 
of arrival of a transmitted message at the receiving nodes. To paraphrase Kopetz [Kopetz 1997], a 
consistent delivery order of a set of messages does not necessarily reflect the temporal or causal 
order of the events. 

Each node is driven by an independent local physical oscillator. The oscillators of good 
nodes have a known bounded drift rate, 1 »p > 0, with respect to real time. Each node has two 
logical time clocks, LocalJTimer and State_Timer, which locally keep track of the passage of 
time as indicated by the physical oscillator. In the context of this report, all references to clock 
synchronization and self-stabilization of the system are with respect to the State JTimer and the 
LocalJTimer of the nodes. There is neither a central clock nor an externally generated global 
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A mechanized proof is a formal verification via either a theorem prover or model checker. 
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pulse. The communication channels and the nodes can behave arbitrarily, provided that 
eventually the system adheres to the system assumptions (see Section 3.5). 

The latency of interdependent communications between the nodes is expressed in terms 
of the minimum event-response delay, D , and network imprecision, d. These parameters are 
described with the help of Figure 1. In Figure 1, a message transmitted by node Ni at real time to 
is expected to arrive at all destination nodes Nj, be processed, and subsequent messages 
generated by Nj within the time interval of [to + D, to + D + d\ for all Nj e K (j . Communication 
between independently clocked nodes is inherently imprecise. The network imprecision, d, is the 
maximum time difference between all good receivers, Nj, of a message from Ni with respect to 
real time. The imprecision is due to the drift of the clocks with respect to real time, jitter, 
discretization error, and slight variations in the communication delay due to various causes such 
as temperature effects and differences in the lengths of the physical communication medium. 
These two parameters are assumed to be bounded such that D > 1 and d > 0 and both have values 
with units of real time nominal tick. For the remainder of this report, all references to time are 
with respect to the nominal tick and are simply referred to as clock ticks. 

t 0 t () +D ty+D+d 

I 7^ t 1 > 


D d 

Figure 1. Event-response delay, D, and network imprecision, d. 


3. Protocol Description 

The self-stabilization problem has two facets. First, it is inherently event-driven and, 
second, it is time-driven. Most attempts at solving the self-stabilization problem have focused 
only on the event-driven aspect of this problem. Additionally, all efforts toward solving this 
problem must recognize that the system undergoes two distinct phases, un-stabilized and 
stabilized, and that once stabilized, the system state needs to be preserved. The protocol 
presented here properly merges the time and event driven aspects of this problem in order to self- 
stabilize the system in a gradual and yet timely manner. Furthermore, this protocol is based on 
the concept of a continual vigilance of state of the system in order to maintain and guarantee its 
stabilized status, and a continual reaffirmation of nodes by declaring their internal status. 
Finally, initialization and/or reintegration are not treated as special cases. These scenarios are 
regarded as inherent part of this self- stabilizing protocol. 

The self-stabilization events are captured at a node via a selection function that is based 
on received valid messages from other nodes. When such an event occurs, it is said that a node 
has accepted or an accept event has occurred. 

When the system is stabilized, it is said to be in the steady state. 
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In order to achieve self-stabilization, the nodes communicate by exchanging two self- 
stabilization messages labeled Resync and Affirm. The Resync message reflects the time-driven 
aspect of this self-stabilization protocol, while the Affirm message reflects the event-driven 
aspect of it. The Resync message is transmitted when a node realizes that the system is no longer 
stabilized or as a result of a resynchronization timeout. It indicates that the originator of the 
Resync message has to reset and try to reengage in the self-stabilization process with other nodes. 
The Affirm message is transmitted periodically and at specific intervals primarily in response to a 
legitimate self-stabilization accept event at the node. The Affirm message either indicates that 
the node is in the transition process to another state in its attempt toward synchronization, or 
reaffirms that the node will remain synchronized. The timing diagram of transmissions of a good 
node during the steady state is depicted in Figure 2 . In the following figures, Resync messages 
are represented as R and Affirm messages are represented as A. The line segments indicate the 
time of the transmission of messages. As depicted, the expected sequence of messages 
transmitted by a good node is a Resync message followed by a number of Affirm messages, i.e. 
RAAA ... AAARAA. The exact number of consecutive Affirm messages will be accounted for 
later in this report. 

A A R A A AARA 

i 1 1 1 1 -//- 1 1 1 1 > time 

Figure 2 . Timing diagram of transmissions of a good node during the steady state. 

The time difference between the interdependent consecutive events is expressed in terms 
of the minimum event-response delay, D , and network imprecision, d. As a result, the approach 
presented here is expressed as a self- stabilization of the system as a function of the expected time 
separation between the consecutive Affirm messages, Aaa- To guarantee that a message from a 
good node is received by all other good nodes before a subsequent message is transmitted, Aaa is 
constrained such that Aaa ^ iP + d). Unless stated otherwise, all time dependent parameters of 
this protocol are measured locally and expressed as functions of Aaa- 

In Figure 3 , node A, is shown to transmit two consecutive Affirm messages. In the steady 
state, Nt receives one Affirm message from every good node between any two consecutive Affirm 
messages it transmits. Since the messages may arrive at any time after the transmission of an 
Affirm message, the accept event can occur at any time prior to the transmission of the next 
Affirm message. 
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Figure 3 . Typical activities of A between two A messages in a stabilized system. 
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Three fundamental parameters characterize the self-stabilization protocol presented 
here, namely K. D. and d. The number of faulty nodes, F, the number of good nodes, G, and the 
remaining parameters that are subsequently enumerated are derived parameters and are based 
on these three fundamental parameters. Furthermore, except for K , F, and G which are integer 
numbers, all other parameters are real numbers. In particular, 4-u is used as a threshold value for 
monitoring of proper timing of incoming and outgoing Affirm messages. The derived parameters 
T a = G - 1 and T R = F + 1 are used as thresholds in conjunction with the Affirm and Resync 
messages, respectively. 


3.1. The Monitor 

The transmitted messages to be delivered to the destination nodes are deposited on 
communication channels. To closely observe the behavior of other nodes, a node employs (A'-l) 
monitors, one monitor for each source of incoming messages as shown in Figure 4. A node 
neither uses nor monitors its own messages. The distributed observation of other nodes locali z es 
error detection of incoming messages to their corresponding monitors, and allows for 
modularization and distribution of the self-stabilization protocol process within a node. A 
monitor keeps track of the activities of its corresponding source node. A monitor detects proper 
sequence and timeliness of the received messages from its corresponding source node. A 
monitor reads, evaluates, time stamps, validates, and stores only the last message it receives from 
that node. Additionally, a monitor ascertains the health condition of its corresponding source 
node by keeping track of the current state of that node. As K increases so does the number of 
monitors instantiated in each node. Although similar modules have been used in engineering 
practice and, conceptually, by others in theoretical work, as far as the author is aware this is the 
first use of the monitors as an integral part of a self-stabilization protocol. 
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Figure 4. The i th node, Nj, with its monitors and state machine. 
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3.2. The State Machine 


The assessment results of the monitored nodes are utilized by the node in the self- 
stabilization process. The node consists of a state machine and a set of (K-\) monitors. The state 
machine has two states, Restore state (7) and Maintain state (M). that reflect the current state of 
the node in the system as shown in Figure 5. The state machine describes the collective behavior 
of the node, Nj, utilizing assessment results from its monitors , Mi .. Mm, M l+ i .. Mk as shown in 
Figure 4, where Mj is the monitor for the corresponding node N r In addition to the behavior of 
its corresponding source node, a monitor's internal status is influenced by the current state of the 
node’s state machine. In a master-slave fashion, when the state machine transitions to another 
state it directs the monitors to update their internal status. 


R, A A A 



Figure 5. The node state machine. 

The transitory conditions enable the node to migrate to the Maintain state and are defined as: 

1. The node is in the Restore state, 

2. At least 2 F accept events in as many A aa intervals have occurred after the node entered 

the Restore state, 

3. No valid Resync messages are received for the last accept event. 

The transitory delay is the length of time a node stays in the Restore state. 

The minimum required duration for the transitory delay is IF Aaa after the node enters the 
Restore state. The maximum duration of the transitory delay is dependent on the number of 
additional valid Resync messages received. Validity of received messages is defined in 
Section 3.3. When the system is stabilized, the maximum delay is a result of receiving valid 
Resync messages from all faulty nodes. Since there are at most F faulty nodes present, during 
the steady state operation the duration of the transitory delay is bounded by \2FA aa , 2FA aa \. 

A node in either of the Restore or Maintain state periodically transmits an Affirm message 
every A aa . When in the Restore state, it either will meet the transitory conditions and transition 
to the Maintain state, or will remain in the Restore state for the duration of the self-stabilization 
period until it times out and transmits a Resync message. When in the Maintain state, a node 
either will remain in the Maintain state for the duration of the self- stabilization period until it 
times out, or will unexpectedly transition to the Restore state because T R other nodes have 
transitioned out of the Maintain state. At the transition, the node transmits a Resync message. 

The self-stabilization period is defined as the maximum time interval (during the steady 
state ) that a good node engages in the self-stabilization process. In this protocol the self- 
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stabilization period depends on the current state of the node. Specifically, the self-stabilization 
period for the Restore state is represented by P T and the self-stabilization period for the Maintain 
state is represented by P M ■ Pt and P M are expressed in terms of A aa . The length of time a good 
node stays in the Restore state is denoted by L T . During the steady state L T is always less than 
Pj. The time a good node stays in the Maintain state is denoted by L AI . When the system is 
stabilized Lm is less than or equal to Pm- The effective self- stabilization period, P Effective, is the 
time interval between the last two consecutive resets of the Local_Timer of a good node in a 
stabilized system, where P Effective = L T + L M < Pt + Pm- 


In Figure 6 the transitions of a node from the Restore state to the Maintain state (during 
the steady state) are depicted along a timeline of activities of the node. The line segments in 
Figure 6 indicate timing and order of the transmission of messages along the time axis. Two new 
parameters, Ara and A ar , are introduced in this figure in order to clarify other aspects of this 
protocol’s behavior. These parameters are defined in terms of Aaa- Although a Resync message 
is transmitted immediately after the node realizes that it is no longer stabilized, i.e. 0 < A A r < 
Aaa, an AfT irm message is transmitted once every Aaa, i.e. Ara = A aa . 


A 


ARA A A A A R A A 

-I 1 , I 7/^ 1 ! 1 


time 
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Figure 6. Timing diagram of activities of a good node during the steady state. 

A node keeps track of time by incrementing a logical time clock, State_Timer, once every 
Aaa- After the State_Timer reaches Pi or Pm, depending on the current state of the node, the 
node experiences a timeout, transmits a new Resync message, resets the State_Timer, transitions 
to the Restore state, and attempts to resynchronize with other nodes. If the node was in the 
Restore state it remains in that state after the timeout. The current value of this timer reflects the 
duration of the current state of the node. It also provides insight in assessing the state of the 
system in the self-stabilization process. 

In addition to the State_Timer, the node maintains the logical time clock LocalJTimer. 
The LocalJTimer is incremented once every A aa and is reset only when the node has transitioned 
to the Maintain state and remained in that state for the duration of [ Ap rec i S i on 1, where Ap rec i S i on is 
the maximum guaranteed self- stabilization precision. The LocalJTimer is intended to be used by 
higher level protocols and is used in assessing the state of the system in the self-stabilization 
process. 

The monitor ' s status reflects its perception of its corresponding source node. In 
particular, a monitor keeps track of the incoming messages from its corresponding source and 
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ensures that only valid messages are stored. If the expected time of arrival of a message is 
violated or if the message arrives out of the expected sequence, then it is marked as invalid. 
Otherwise, it is marked as valid and stored for the host node’s consumption. It is important to 
note that this protocol is expected to be used as the fundamental mechanism in bringing and 
maintaining a system within a known synchronization bound. This protocol neither maintains a 
history of past behavior of the nodes nor does it attempt to classify the nodes into good and 
faulty ones. All such determination about the health status of the nodes in the system is assumed 
to be done by higher level mechanisms. 


3.3. Message Sequence 

An expected sequence is defined as a stream of Affirm messages enclosed by two Resync 
messages where all received messages arrive within their expected arrival times. The time 
interval between the last two Resync messages is represented by A RR . 

The following are three sequences where represents a missing message: 

• RAAA . . . AAAR expected sequence , all A messages present 

• RA-A ... A—R unexpected message sequence, missing A messages 

• R— ... —R unexpected message sequence, no A messages present 

When a node is in the Restore state, its output sequence of messages has one of two 
patterns. If the node does not transition to the Maintain state, it times out after P T and its 
expected sequence of output messages will be RAAA . . . AAAR, consisting of Pj consecutive A 
messages. In this case, A RR = Pj. On the other hand, when the node synchronizes with other 
nodes, it transitions to the Maintain state before timing out, and its expected sequence of output 
messages will have at least 2 F Affirm messages followed by those Affirm messages produced in 
the Maintain state. The shortest amount of time it takes a node to transition to the Maintain state 
is 2 FAaa- The shortest amount of time the node stays in the Maintain state is Aa R . Therefore, the 
time separation between any two consecutive Resync messages from a good node is given by 
A rr > IFAaa + A ar . As a result, the shortest expected sequence consists of IF A messages 
enclosed by two R messages with a duration of A RRjnin = IFAaa + 1 clock ticks. 

When a node is in the Maintain state, it has two possible output sequences of messages. 
If it times out after P M , its expected sequence of output messages will be RAAA ... AAAR 
consisting of an R message, followed by A messages for when the node was in the Restore state, 
followed by at least Pm consecutive A messages for the duration of the Maintain state, followed 
by another R message. Therefore, (P T + Pm) > A RR . in other words, A RRjnax = (Pt + Pm)- On the 
other hand, when the node abruptly transitions out of the Maintain state, its output sequence of 
messages will consist of fewer Affirm messages. The sequence consists of an R message, 
followed by A messages for when the node was in the Restore state, followed by A messages for 
the duration of the Maintain state, followed by another R message. 
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As depicted in Figure 6, starting from the last transmission of the Resync message 
consecutive Affirm messages are transmitted at 4-u intervals. At the receiving nodes, the 
following definitions hold: 

- A message ( Resync or Affirm ) from a given source is valid if it is the first message from 
that source. 

- An Affirm message from a given source is early if it arrives earlier than (A aa - d) of its 
previous valid message ( Resync or Affirm). 

- A Resync message from a given source is early if it arrives earlier than A RRMUn of its 
previous valid Resync message. 

- An Affirm message from a given source is valid if it is not early. 

- A Resync message from a given source is valid if it is not early. 

The protocol works when the received messages do not violate their timing requirements. 
However, in addition to inspecting the timing requirements, examining the expected sequence of 
the received messages provides stronger error detection at the nodes. 


3.4. Protocol Functions 

The functions used in this protocol are described in this section. 

Two functions, InvalidAffirm() and InvalidResync(), are used by the monitors. The 
InvalidAffirmO function determines whether or not a received Affirm message is valid. The 
InvaliclResync() function determines if a received Resync message is valid. When either of these 
functions returns a true value, it is indicative of an unexpected behavior by the corresponding 
source node. 

The Accept() function is used by the state machine of the node in conjunction with the 
threshold value T A = G - 1. When at least T A valid messages ( Resync or Affirm) have been 
received, this function returns a true value indicating that an accept event has occurred and such 
event has also taken place in at least F other good nodes. When a node accepts, it consumes all 
valid messages used in the accept process by the corresponding function. Consumption of a 
message is the process by which a monitor is informed that its stored message, if it existed and 
was valid, has been utilized by the state machine. 

The Retry() function is used by the state machine of the node with the threshold value 
Tr = F +1. This function determines if at least T R other nodes have transitioned out of the 
Maintain state. A node, via its monitors, keeps track of the current state of other nodes. When at 
least T r valid Resync messages from as many nodes have been received, this function returns a 
true value indicating that at least one good node has transitioned to the Restore state. This 
function is used to transition from the Maintain state to the Restore state. 
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The Transitory ConditionsMet() function is used by the state machine of the node to 
determine proper timing of the transition from the Restore state to the Maintain state. This 
function keeps track of the accept events , by incrementing the Accept_Event_Counter, to 
determine if at least IF accept events in as many Aaa intervals have occurred. It returns a true 
value when the transitory conditions (see Section 3.2) are met. 

The TimeOutRestoreO function uses P T as a boundary value and asserts a timeout 
condition when the value of the State _Timer has reached Pj. Such timeout triggers the node to 
reengage in another round of self-stabilization process. This function is used when the node is in 
the Restore state. 

The TimeOutMaintain() function uses Pm as a boundary value and asserts a timeout 
condition when the value of the State_Timer has reached P M ■ Such timeout triggers the node to 
reengage in another round of synchronization. This function is used when the node is in the 
Maintain state. 

In addition to the above functions, the state machine utilizes the TimeOutAcceptEvent( ) 
function. This function is used to regulate the transmission time of the next Affirm message. 
This function maintains a DelatAA_Timer by incrementing it once per local clock tick and once it 
reaches the transmission time of the next Affirm message, Aaa, it returns a true value. In the advent of 
such timeout, the node transmits an Affirm message. 


3.5. System Assumptions 

1. The source of the transient faults has dissipated. 

2. All good nodes actively participate in the self-stabilization process and execute the 
protocol. 

3. At most F of the nodes are faulty. 

4. The source of a message is distinctly identifiable by the receivers from other sources of 
messages. 

5. A message sent by a good node will be received and processed by all other good nodes 
within Aaa, where A aa > (D + d). 

6. The initial values of the state and all variables of a node can be set to any arbitrary value 
within their corresponding range. In an implementation, it is expected that some local 
capabilities exist to enforce type consistency of all variables. 
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3.6. The Self-Stabilizing Clock Synchronization Problem 

To simplify the presentation of this protocol, it is assumed that all time references are 
with respect to a real time to when the system assumptions are satisfied and the system operates 
within the system assumptions. Let 

• C be the maximum convergence time, 

• A Locai_Timer(t ), for real time t, the maximum time difference of the LocalJTimers of any 
two good nodes N, and Nj, and 

• Ap rec i S i on the maximum guaranteed self-stabilization precision between the LocalJTimer ' s 
of any two good nodes Nj and Nj in the presence of a maximum of F faulty nodes, V N,. 
Nj G Kq. 


Convergence: From any state, the system converges to a self- stabilized state after a finite 
amount of time. 


1. V Nj, Nj g K g , A Local_Timei{ C ) — Ap rec i s ion- 

2. V Ni, Nj g Kq, at C, N, perceives Nj as being in the Maintain state. 

Closure: When all good nodes have converged such that A LoC ai_Time,{C) < Ap rec i sion at time C, the 
system shall remain within the self- stabilization precision Ap rec j S i on for t> C, for real time 1. 

VNi, Nj e K g , t >C, A Local_Timer(t) — Aprecision, 


where, 

C = ( 2 Pp + Pm) Aaa, 

Apocai Timerf t ) = min ( max ( Local_Timerj , Local_TimerJ) - 
min {Local JTimeru Local_Timerj), 

max (. LocalJTimer j - [ Ap recision \, LocalJTimer ,■ - 1 Ap recisio J) - 
min {LocalJTimer i - \ Ap recisk) J, Local JTimerj -\ A Precision \)), 
r Ap rec i s i 0 n \ — truncate ( Ap rec i s i on -(- 0.5), 

and, 

( LocalJTimer - \ Ap recisio J\) is the \Ap rec i sion r previous value of the LocalJTimer 

and, 

Ap re dsion — (3 F 1 ) *d \/\ D + /\l),iji 


where, the amount of drift from the initial precision is give by 

Aprift = (( 1 +P) - 1/(1 +p)) Pm Aaa- 



4. The Byzantine-Fault Tolerant Self-Stabilizing Protocol for Distributed 
Clock Synchronization Systems 


The presented protocol is described in Figure 7 and consists of a state machine and a set 
of monitors which execute once every local oscillator tick. 


Monitor: 

case (incoming message from the 
corresponding node) 

( Resync : 

if InvalidResync() then 
Invalidate the message 

else 

Validate and store the message, 
Set state status of the source. 


Affirm : 

if InvalidAffirmO then 
Invalidate the message 

else 

Validate and store the message. 

Other : 

Do nothing. 

} // case 


Node: 

Maintain : 

case (state of the node) 

if TimeOutMaintain( ) or Retry( ) then 

( Restore : 

Transmit Resync message, 

if TimeOutRestore( ) then 

Reset State JTimer, 

Transmit Resync message, 

Reset DelatAA_Timer, 

Reset State _Timer, 

Reset Accept _Event_Counter, 

Reset DelcitAA_Timer, 

Go to Restore state, 

Reset Accept _Event_Counter, 


Stay in Restore state, 

elsif TimeOutAcceptEvent( ) then 
if Accept( ) then 

elsif TimeOut AcceptEventf) then 

Consume valid messages., 

Transmit Affirm message, 

if (State JTimer = [ A Precisio „ 1) 

Reset DeIatAA_Timer, 

Reset Local JTimer., 

if Accept( ) then 

Transmit Affirm message, + 

Consume valid messages, 

Reset De l at AA JTimer, 

Clear state status of the sources, 

Stay in Maintain state, 

Increment Accept _Event_Counter, 


if Transitory ConditionsMetO then 

else 

Reset State JTimer, 

Stay in Maintain state. 

Go to Maintain state, 

} // case 

else 

Stay in Restore state. 

else 

Stay in Restore state., 

else 

Stay in Restore state. 



Figure 7. The self-stabilization protocol. 
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4.1. Semantics of the pseudo-code 

• Indentation is used to show a block of sequential statements. 

• Y is used to separate sequential statements. 

• ‘.’is used to end a statement. 

• is used to mark the end of a statement and at the same time to separate it from other 
sequential statements. 

t In a variation of this protocol and in conjunction with a higher level mechanism, a good node 
stops transmitting Affirm messages after it is determined by the higher level mechanism that 
the system has stabilized. It follows from Theorem StopContinuousTransmit that such 
variation preserves the self- stabilization properties. Nevertheless, such optimization in the 
number of exchanged self- stabilization messages is at a cost of delaying error detection, 
introducing jitters in the system, and prolonging the self-stabilization process. 


5. Proof of the Protocol 

The approach for the proof is to show that a system of K > 3F + 1 nodes converges from 
any condition to a state where all good nodes are in the Maintain state. This system is then 
shown to remain within the timing bounds of the self-stabilization precision of A Precision . The 
Lemmas and Theorems are presented in this section. 

Since the oscillator drift rate, p , does not play a significant role in the convergence 
process, it is omitted from the expressions regarding parameters, constants, equations, and the 
proofs. However, p does affect the closure property and is included in expressions regarding 
A precision- Omission of p does not change the behavior of the protocol or the validity of the 
proofs. 

Assumptions: All good nodes are active and the system operates within the system assumptions. 
In this proof, unless otherwise stated in the Lemmas and Theorems, no other assumptions are 
made about the system. Also, throughout the proofs, unless stated otherwise, all references to 
the Resync and Affirm messages are with respect to valid messages. 

A node behaves properly if it executes the protocol. 

Lemma TransmitEveryA A A -A good node in either Restore or Maintain state transmits at least 
one message (Resync or Affirm) every Aaa interval. 

Proof - It follows from the protocol that the DelatAA_Timer is reset after transmission of 

a self- stabilization message ( Resync or Affirm). It is expressed in function 
TimeOutAcceptEventO that the node transmits an Affirm message every Aaa interval. 
Additionally, if after transmitting an Affirm message and within the next Aaa interval, a node 
times out to engage in another round of self-stabilization process, it will also transmit a Resync 
message within that Aaa interval. ♦ 
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Theorem ResyncWithinPx - A good node remaining in the Restore state transmits a Resync 
message within at most Pj A aa clock ticks. 

Proof - It is expressed in function TimeOutRestore() that if a node remains in the Restore 

state and does not transition to the Maintain state, it will time out within Pj Aaa clock ticks, 
transmit a Resync message, and stay in the Restore state. ♦ 

Theorem Re store WithinPM -A good node in the Maintain state transitions to the Restore state 
within at most Pm Aaa clock ticks. 

Proof - It follows from the protocol that a node in the Maintain state will transition from 

the Maintain state to the Restore state either because of a resynchronization timeout, as 
expressed in function TimeOutMaintain(), or when the system becomes unstabilized, as 
expressed in function Retry(). Upon transitioning to the Restore state, the node transmits a 
Resync message. Since the longest such time interval is due to the timeout, the node transmits a 
Resync message in at most P M Aaa clock ticks. ♦ 

Lemma DeltaRRmin - The shortest time interval between any two consecutive Resync 
messages from a good node is 2 FAaa + 1 clock ticks. 

Proof- From the definition of the transitory conditions in Section 3.2, the minimum 

required duration for the transitory delay is 2 FAaa after the node entered the Restore state. The 
shortest amount of time the node stays in the Maintain state is Aar, as shown in Figure 6. 
Therefore, the time separation between any two consecutive Resync messages from a good node 
is given by A RR > IF Aaa + A ar . As a result, A RRytnin = IF Aaa + 1 clock ticks. ♦ 

Theorem RestoreToMaintain - A good node in the Restore state will cdways transition to 
the Maintain state. 

Proof - Let us consider the worst case scenario where a node wakes up with its internal 

variables randomly set except that its state is set to the Restore state. A sequence of activities of 
the good node Ns, for a system of K = 7, F = 2 and G = 5 nodes, is depicted in Figure 8. The 
activities of the good node are partitioned in different zones along the time axis. The following 
symbols are used in this figure. 

X = don’t care 

A = an Affirm message transmitted 

R g i = a Resync message received from the i th good node 

Rfj = a Re sync message received from the j th faulty node 
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Zone 2 



Out: AAAAAAAAAAAAAAAAAAA A 
In: X X X X X X X X R fl R f2 R gl R g2 R g3 R g4 R n R l2 

V J 

A RR for faulty node N fl 

Figure 8. Worst case sequence of activities of a good node after random start up for F = 2. 

Since receiving a Resync message can force the node to remain in a state of transition, 
only the Resync messages are shown in this figure. Also, since receiving one Resync message 
during the current Aaa interval can prevent the node from transitioning to the Maintain state, the 
sequence of activities are shown for the worst case scenario where only one Re sync message is 
received within the time interval of any two consecutive transmissions of Affirm messages, i.e. at 
every A aa . 

Zonel: If a good node A, perceives that it has received a Resync message from another good 
node Nj, it follows from Lemma DeltaRRmin that for the duration of a Resync message 

from Nj will be rejected. Therefore, for the worst case scenario, let us assume that a good node 
does not receive enough valid messages and accept events will not take place for the first 
(2F+1)zUa > Arhjiui, clock ticks. However, it follows from Lemma TransmitEvery Aaa that all 
good nodes transmit a message every Aaa interval. Therefore, by Lemma DeltaRRmin, after 
A RRy min a good node will receive at least T A messages for all subsequent A aa intervals and 
consequently accept events will take place during those intervals. 

Zone 2: It follows from the protocol that a node has to wait for the minimum transitory delay of 
2FAaa before transitioning to the Maintain state. To prevent the node from transitioning to the 
Maintain state, the minimum transitory delay should not be met. Therefore, a Resync message 
has to be received at the last Aaa interval. As a result, duration of this zone will be (2F - 1 )A aa 
intervals. 

Zone 3: To prolong the duration of the Restore state for this node, Aj, the faulty nodes transmit 
Resync messages interleaved with the Resync messages from the other good nodes, A; through 
A/, such that these messages are perceived valid by N 5 . From Lemma DeltaRRmin, Resync 
messages have to be at least A RRmin apart in order to be considered valid. Since there are F faulty 
nodes, the remaining F + 1 intervals between the consecutive Resync messages of a faulty node 
must be filled by Resync messages from other good nodes. As expressed in function Retry(), it 
takes Tr vcdid Resync messages for a good node to transition from the Maintain state to the 
Restore state. 
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Since T R = F + 1, after transmission of T R Resync messages from as many good nodes, Ni 
through N 3 , all other good nodes that remained in the Maintain state, e.g. N 4 . will transition to the 
Restore state. Therefore, at this point, all good nodes will have transitioned to the Restore state. 
Also, none of the good nodes that had transitioned to the Restore state can meet the transitory 
conditions and transition back to the Maintain state in the mean time. 

The longest sequence results when F Resync messages from as many faulty nodes are 
followed by 2 F Resync messages from as many good nodes, Ni through N4, followed by F 
additional Resync messages from as many faulty nodes as depicted in Figure 8. Therefore, the 
maximum duration of this zone will be F + 2F + F = AF consecutive Aaa intervals. 

Following the time line of activities in the figure, the node has been in the Restore state 
for the maximum possible transitory delay of (2F+ 1 )+ (IF - 1)+ (F+2F+F) = 8 F. 

Zone 4: At this point, no other Resync messages are expected to arrive from other good nodes 
and from Lemma DeltaRRmin any additional Resync messages from the faulty nodes will be 
considered as invalid. Therefore, 

State _Timer(t) = State _Timer(to) + 8 F <P T 

where, 

State_Timer(to) = 0, 
or, 

0 < State _Timer(to) < Pt- 

The subsequent behavior of the node is, therefore, dependent on the initial value of its 
State_Timer, i.e. State_Timer(to). There are two possible initial scenarios for this node’s 
State_Timer, either the State_Timer is reset to zero or it holds a non-zero value within its range, 
i.e. its initial value is less than or equal to P T . If the StateJTimer is initially reset, unless this 
node times out, it will have to transition out of the Restore state at the next A AA - So, assuming Pj 
is large enough so that the node does not time out, it has to transition to the Maintain state at the 
next Aaa as shown in Figure 8. 

If the StateJTimer is initially non-zero and the current value of the StateJTimer is less 
than Pt, the node does not time out and transitions to the Maintain state at the next Aaa, as shown 
in Figure 8. Otherwise, the current value of the StateJTimer is Pj (hence the worst case 
scenario), and this node times out at the next A aa , transmits a Resync message and remains in the 
Restore state as shown in Figure 9. The sequence of input message in Figure 9 reveals a 
potential circular pattern in the behavior of the node. However, unlike the initial scenario for this 
case where StateJTimer was not reset to zero, the StateJTimer is now reset to zero. 

Out: AAAAAAAAAAAAAAARAAAAA 
In: X X X X X X X R fl R f2 R gl R g2 R e3 R g4 R fl R f2 X X X X R fl 

Figure 9. Worst case sequence of activities of a good node at random start up for F = 2. 
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Using a similar argument as for the first case where State _Timer(to) = 0, this node will 
transition to the Maintain state within the next P T . Therefore, a node will always transition from 
the Restore state to the Maintain state. ♦ 

From Theorem Reslo re T oMa ini a in . the maximum possible transitory delay for a node in 
the Restore state is 8 F. However, in order to allow the node to transition to the Maintain state at 
the next A aa , it has to be prevented from timing out. Therefore, the required minimum period, 
Pr.min is constrained to be P^mm = 8F+2. 

Although Pt can be any value larger than Pr.inin, it follows from Theorem 
RestoreToMaintain that it cannot exceed that minimum value. Also, in order to expedite the 
self-stabilization process, the convergence time has to be minimized. Thus, Pt is constrained to 
Pr.inin- The self-stabilization period for the Maintain state, P M , is typically much larger than P T . 
Thus, P M is constrained to be P M > Pt- 

Corollary RestoreToMaintainWithin2P T - A good node in the Restore state will always 
transition to the Maintain state within 2 Pt- 

Proof - From the proof of Theorem RestoreToMaintain , a node in the Restore state will 

either transition to the Maintain state within the first Pt, or it will time out and remain in that 
state. For the later case, it also follows that the node will transition to the Maintain state within 
the next Pt, therefore, the node will transition to the Maintain state within 2Pt- ♦ 

All good nodes validate an Affirm message from a good node if the minimum arrival time 
requirement for that message is not violated. By Lemma DeltaRRmin, consecutive Resync 
messages from a good node are always more than A RRjnin apart. Therefore, after a random start- 
up, it takes more than A RRmin clock ticks for Resync messages from a good node to be accepted 
by all other good nodes. If a node is in the Restore state, from Theorem ResyncWithinP T , it will 
either time out and transmit a Resync message within Pt or from Theorem RestoreToMaintain 
and Corollary RestoreToMaintainWithin2P T , it will transition to the Maintain state within 2 P T - 
Therefore, for the proof of this protocol, and for the following lemmas and theorems, the state of 
the system is considered after 2Pt A aa clock ticks from a random start. At this point, the system 
is in one of the following three states and all Resync messages from the good nodes are at least 
A RR . min apart and all Affirm messages from the good nodes meet their timing requirements at the 
receiving good nodes. 

1 None of the good nodes are in the Maintain state 

2 All good nodes are in the Maintain state 

3 Some of the good nodes are in the Maintain state 

Theorem ConvergeNoneMaintain - A system of K > 3F + 1 nodes, where none of the good 
nodes are in the Maintain state and have not met the transitory conditions, will cdways converge. 
Proof - Since none of the good nodes are in the Maintain state, they are in the Restore 

state either because they have just transitioned there or are forced to remain there due to 
receiving Resync messages either from other good nodes or from the faulty nodes. Since these 
nodes accept each other’s messages, they will receive at least T A vcdid messages ( Affirm or 
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Resync) from each other every A aa , and will accept and transmit Affirm messages at every A aa 
interval. 


The Earliest a good node transitions to the Maintain state (EM) is after it has remained in 
the Restore state for the minimum duration of the transitory delay plus at least two accept events 
after the last good node transitioned to the Restore state. The earliest the first of the two accept 
events happens is D ticks after the transmission of the last Resync message. Therefore the EM 
happens at D + Aaa- The Latest a good node transitions to the Maintain state (EM) is after 
remaining in the Restore state for the maximum duration of the transitory delay , i.e. after 
receiving Resync messages from all faulty nodes. In this case, the EM happens at the last good 
node transmitting the Resync message, i.e. at (2F+F)Aaa = 3FAaa since its transition to the 
Restore state. So, the time difference between the EM and EM nodes is given by 

Almem = 3FAaa - (Aaa + D) = ( 3 F - 1 ) Aaa - D. ♦ 


The self-stabilization precision, Ap rec i S i on , is the maximum time difference between the 
Local ffTimer % of any two good nodes when the system is stabilized. It is, therefore, the 
guaranteed precision of the protocol. From Theorem ConvergeNoneMciintciin, the initial 
precision after the resynchronization is the maximum value of Almem- 

After the initial synchrony and due to the drift rate of the oscillators, Local_Timers of the 
good nodes will deviate from the initial precision. This phenomenon is depicted in Figure 10. 


Slow 

Fast 



P 


s 


Almem 


A 


Precision 


Figure 10. The self-stabilization precision. 


Therefore, the guaranteed self-stabilization precision, Ap rec i sion , after elapsed time of P M 
Aaa clock ticks, is bounded by, 

Aprecision — ApMEM " 1 “ Aj) r ifl 

where, the amount of drift from the initial precision is give by 
4 Drift ~ ((!+/?)- 1 /( 1 +/?)) P M Aaa- 


The factors (1 +p) and 1/(1+/?) are, respectively, associated with the slowest and fastest nodes in 
the system. Therefore, 

4 Precision ~ ( 3 F - 1 ) Aaa - D + A Dri f t . 
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Corollary MutuallyStabilized - All good nodes mutually perceive each other as being in the 
Maintain state. 

Proof - It follows from Theorem ConvergeNoneMaintain that upon convergence and as 

the good nodes transition to the Maintain state, they mutually perceive each other to be in the 
Maintain state. ♦ 

Theorem ConvergeAllMaintain - A system of K > 3F + 1 nodes, where all good nodes are in 
the Maintain state, will always converge. 

Proof - Since no assumptions are made about the relative timing of the good nodes, 

A Locai_Timer(t ) > Ap recision is possible. In this case, all good nodes believe to be synchronized even 
though the system is not. 

It follows from the protocol, Theorem ResyncWilhinP/ and Theorem Res to re W ilh in PM. 
that all good nodes will eventually time out, transition to the Restore state, and transmit Resync 
messages. The first (Tr- 1) good nodes that transition to the Restore state may transition back to 
the Maintain state before all other good nodes transition to the Restore state. A good node in the 
Maintain state keeps track of other nodes that have transitioned to the Restore state. Therefore, 
after the T R th good node transitions to the Restore state, the remaining good nodes, a total of F 
nodes, will receive Tr Resync messages from as many good nodes, will transition to the Restore 
state and transmit Resync messages within the next A aa . Any of the first (T R - 1) good nodes that 
had transitioned to the Restore state and then back to the Maintain state, will now receive 
(Tr + 1) Resync messages within 2Aaa from as many good nodes, will transition to the Restore 
state, and transmit Resync messages within the next A aa . At this point all good nodes in the 
system are in the Restore state, are within 24-u of each other, and none of them has met the 
transitory conditions. It follows from Theorem ConvergeNoneMaintain that such a system 
always converges. ♦ 

Theorem Converges omeMaintain - A system of K > 3F + 1 nodes, where some of the good 
nodes are in the Maintain state will always converge. 

Proof - The good nodes that are in the Restore state are there either because they have just 

transitioned there or are forced to remain there due to receiving Resync messages either from 
other good nodes or from the faulty nodes. Furthermore, their transitions to the Restore state are 
recorded by the good nodes that are in the Maintain state. It follows from Lemma 
MaintainWithinPr that unless these unstabilized nodes time out within P/. they’ll transition to 
the Maintain state. It follows from the protocol and Theorem RestoreWithinPM that all good 
nodes that are in the Maintain state will eventually time out, transition to the Restore state, and 
transmit Resync messages. 

There are two possible scenarios for the system. Since the transitions to the Restore state 
are recorded by other good nodes in the Maintain state, as soon as Tr good nodes have 
transitioned to the Restore state, in a similar argument as in Theorem ConvergeAllMaintain, the 
remaining good nodes will also transition to the Restore state. At this point, the system consists 
of all good nodes in the Restore state where none of them has met the transitory conditions. It 
follows from Theorem ConvergeNoneMaintain that such a system always converges. 
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The second possibility is that if the good nodes in the Restore state transition back to the 
Maintain state, the system consists of all good nodes in the Maintain state and it follows from 
Theorems ConvergeNoneMaintain and ConvergeAllMaintain that such a system always 
converges. ♦ 

Lemma PrecisionLargerThanTD - The self-stabilization precision, Ap rec i S i on , is greater than the 
minimum transitory delay ( TD m i„)for F > 2. 

Proof - In other words, A Precision - TD min > 0. 

Aprecision — (3 F ”1) Aaa D + Ap> r jft 

Aprecision ~ I'D mill — (3 F - 1) Aaa - D + Ad rij) - 2FA aa — (F - 1) A aa - D + Aouft- 
For F > 2, 

(F - 1) Aaa - D + Adrift >0. ♦ 

Theorem ClosureAllMaintain - A system of K > 3F + 1 nodes, where all good nodes have 
converged such that cdl good nodes are mutually stabilized with each other (in other words, all 
good nodes are in the Maintain state where Ai oca i_Timer(t) < Ap rec i S i on ), shall remain within the self- 
stabilization precision Ap recision . 

Proof - Since all good nodes are in the Maintain state, it follows from Theorem 

RestoreWithinPM that they will transition to the Restore state within Pm- As they transmit 
Resync messages, their transitions to the Restore state are recorded by other good nodes that are 
in the Maintain state. Since the system is stabilized, the good nodes will transition to the Restore 
state within A Precision of each other. However, since from Lemma PrecisionLargerThanTD, and 
for F > 2, the Ap recision is greater than the minimum transitory delay, some good nodes can 
potentially transition to the Restore state and then to the Maintain state before all good nodes 
transition to the Restore state. The proof, therefore, proceeds in the following two parts: 

Similar to the proof of Theorem ConvergeAllMaintain, the first (T R - 1) good nodes that 
transition to the Restore state may transition to the Maintain state before all other good nodes 
transition to the Restore state. Therefore, after the T R h good node transitions to the Restore state, 
the remaining good nodes, a total of F nodes, will have received T R Resync messages from as 
many good nodes, will transition to the Restore state and transmit Resync messages within the 
next Aaa- Any of the first (T R - 1) good nodes that had transitioned to the Restore state and then 
to the Maintain state, will now receive (T R + 1) Resync messages within 2 Aaa from as many good 
nodes, will transition to the Restore state, and transmit Resync messages within the next Aaa- At 
this point all good nodes in the system are in the Restore state, are within 2 Aaa of each other, and 
none of them has met the transitory conditions. It follows from Theorem 
ConvergeNoneMaintain that such a system always converges to within Ap recision . 

On the other hand, if after transitioning to the Restore state, none of the good nodes 
transition to the Maintain state until all good nodes transition to the Restore state, the system 
consists of all good nodes in the Restore state where none of them has met the transitory 
conditions. It follows from Theorem ConvergeNoneMaintain that such a system always 
converges to within Ap recis i on . ♦ 
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Corollary StateTimerLessThanPrecision - In a stabilized system and during the re- 

stabilization process, the maximum value of the StateJTimer is always less than the self- 
stabilization precision Ap recision . 

Proof - From the protocol, the Stcite_Timer is reset when the node transitions to either the 

Restore state or the Maintain state. It follows from the first part of proof of Theorem 
ClosureAllMaintain that some good nodes that transition to the Restore state may transition back 
to the Maintain state before others. The value of the State_Timer of such nodes does not exceed 
A precision- In other words, for these good nodes, 

(State. .Timer) A aa = Ap recision - (2FA aa ) + ( D + d). 

Since Aaa > D + d, 

6 StateJTimer ) A aa < A Precision - 2 FA aa + Aaa, 

(State _Timer) Aaa — Ap re dsion - l' A \ \ Ap re cision- ♦ 

Therefore, the LocalJTimer can be reset at any point where StateJTimer is greater than or 
equal to the precision. In order to expedite the self- stabilization process, LocalJTimer is reset 
when StateJTimer reaches the next integer value greater than Ap recision , i.e. T Ap recisio J\ = truncate 
(A p recision + 0.5). Alternatively, if the amount of drift is such that A Dri f t < (Aaa + D) the 
LocalJTimer can be reset when StateJTimer reaches 3 F. 

Aprecision ^3 F 

( 3 F - 1)Aaa - D + ADrift < 3 F 
-Aaa - D + Aon/t < 0 
A Drift < Aaa + D 

Corollary SteadyStateConvergeTime - In a stabilized system, the maximum convergence 
time is less than 6FAaa- 

Proof - It follows from the first part of Theorem ClosureAllMaintain that the time interval 

from when the first good node transitions to the Restore state until all good nodes transition to 
the Maintain state is given by the following equations. 

Aprecision + Latest to Maintain state ( LM ), 

From Theorem ConvergeNoneMaintain, LM = (3 F - 1) A aa - D , therefore, 

((3 F - l)4u - D + Aorift) + (3 F - l)4u - D 


Since Aaa > D + d, 

(6F-2)Aaa-2D + Aorift >(6F-4)Aaa + Aorift 


but since typically A aa > A Dr ift, 

(6 F - 4) Aaa + Aorift <6F < Pj. ♦ 

Theorem LocalTimerWithinPrecision - The difference of Local JTimers of cdl good nodes 

in a stabilized system ofK>3F+l nodes will always be within the self-stabilization 

precision, i.e. Ap oca l_Ti m et{t) ^ Aprecision- 
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Proof - Since the LocalJTimer is reset when State_Timer = [ Ap reciS j 0l ^\, it follows from the 

Corollaries StateTimerLessThanPrecision that, during the re-stabilization process, the 
State JTimer never reaches this value and thus the LocalJTimer will not be reset during this 
process. On the other hand, it follows from Theorem ClosureAllMaintain that the good nodes 
will remain within A Precision of each other, thus, A LocaLTimer ( t) < A Precision . ♦ 

Theorem StabilizeFromAnyState - A system of K > 3F + 1 nodes self-stabilizes from any 
random state after a finite amount of time. 

Proof - The proof of this theorem consists of proving the convergence and closure 

properties as defined in the Self-Stabilizing Clock Synchronization Problem section. 

Assumptions: All good nodes are active and the system operates within the system assumptions. 

Convergence - From any state, the system converges to a self-stabilized state after a finite 
amount of time. 

1. VNi, Nj e K g , A Loca!_Timer( C) — Ap rec i s j on . 

2. VNi, Nj € K g , at C, N, perceives Nj as being in the Maintain state. 

Proof - The proof is done in the following four parts. The approach for the proof is 

depicted in Figure 11. The system is shown to converge from any state and upon convergence 
maintain the closure property. The figure is partitioned via a dashed line into two regions. The 
left region depicts the state of the system in the convergence process. The right region depicts 
the system operating in the steady state and maintaining the self-stabilization precision. 

In this figure, the states All , None , and Some represent one of three possible states of the 
system after 2 Pj 4-u clock ticks from a random start. The propositions labeled as theorems 
indicate that a transition from one state to another eventually takes place. 


Theorem 

ConvergeAllStabilized 


Co n ve rg eS o n\e Stab i I i zed 



Theorem 
ConvergeAllStabilized 


Theorem 
Converge 
None 
Stabilized 

Theorem 
Co n ve rg eS o me Stabilized 


Theorems 

Closure A11S tabilized, 
LocalTimerWithinPrecision 



Convergence 


Closure 


Figure 11. Approach for proof of convergence. 

Convergence - None of the good nodes are in the Maintain state. 

Proof - It follows from Theorems Corner geNoneMaintain and ClosureAllMaintain that 
such system always self-stabilizes. 


- 22 - 
















Convergence - All good nodes are in l he Maintain state. 

Proof - It follows from Theorems Con verge Non eMa into in , ConvergeAllMaintain and 
ClosureAllMaintain that such system always self- stabilizes. 

Convergence - Some of the good nodes are in the Maintain state. 

Proof - It follows from Theorems ConvergeNoneMaintain, ConvergeAllMaintain, 
ConvergeSomeMaintain, and ClosureAllMaintain that such system always self-stabilizes. 

Mutually Stabilized - V Nj, Nj € K g , at C, Ni perceives Nj as being in the Maintain 
state. 

Proof - It follows from Corollary MutuallyStabilized that all good nodes mutually 
perceive each other to be in the Maintain state. 

Closure - When all good nodes have converged such that Aux:ai_Timer( C) < Ap rec i S i on , at time C, the 
system shall remain within the self-stabilization precision Ap rec i S i on for t >C, for reed time t. 

V Nj, Nj € K g , t >C, A Local _Timer( t) — Ap rec j s i on . 

Proof - It follows from Theorems ClosureAllMaintain and LocalTimerWithinPrecision that such 
system always remains stabilized and A Loca i_Timer(t) <Ap recis i on for t >C. ♦ 

This protocol neither maintains a history of past behavior of the nodes nor does it attempt 
to classify the nodes into good and faulty ones. Since this protocol self-stabilizes from any state, 
initialization and/or reintegration are not treated as special cases. Therefore, a reintegrating node 
will always be admitted to participate in the self- stabilization process as soon as it becomes 
active. Continual transmission of the Affirm messages by the good nodes expedites the 
reintegration process. 

Lemma ResyncWithinP T PlusP M - A good node transmits a Resync message within at most 
(Pt + Pm) Aaa clock ticks. 

Proof - From Theorem Resync WithinPp, a node in the Restore state will time out within 

Pt Aaa clock ticks. So, if a node transitions from the Restore state to the Maintain state before it 
times out, it had remained in the Restore state for at most (Pt - 1). From Theorem 
RestoreWithinPM , the node will time out within P M ■ Therefore, within at most (P T + Pm) Aaa 
clock ticks a node transmits a Resync message. ♦ 

Theorem ConvergeTime - A system of K > 3F + 1 nodes converges from any random state to 
a self-stabilized state within C = (2P T + Pm) Aaa clock ticks. 

Proof - It follows from Lemma ResyncWithinPpPlusP m that a good node transmits a 
Resync message within at most ( P T + Pm) Aaa clock ticks. It follows from Theorems 
ConvergeNoneMaintain , ConvergeAllMaintain, ConvergeSomeMaintain, ClosureAllMaintain, 
LocalTimerWithinPrecision, and Stabilize FromAny State that the system always converges. It 
also follows from these Theorem and Theorem RestoreToMaintain and Corollary 
SteadyStateConvergeTime that the system converges and all good nodes will transition to the 
Maintain state within the next P t Aaa clock ticks. Therefore, the system convergence within 
(Pt + Pm + Pf) Aaa • Thus, C = (2Pp + Pm) Aaa • ♦ 
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If Pm = Pt, then C = 3 Pm, but since typically P M » Pt, therefore, C can be approximated 
to C = Pm- Therefore, the convergence time of this protocol is a linear function of the P M - 

Theorem StopContinuousTransmit -A stabilized system of K > 3F + 1 nodes does not have to 
transmit Affirm messages continuously. 

Proof - When the system is stabilized, all good nodes are within Ap rec i S i on of each other. It 

follows from Corollary MutuallyStabilized that all good nodes mutually perceive each other to be 
in the Maintain state. Also, it follows from the protocol that the good nodes reset their 
Localjtimers after \ Ap recisl0 ,^\ clock ticks of transitioning to the Maintain state. Since the good 
nodes will not engage in another round of self-stabilization process until they time out, therefore, 
stopping transmission of Affirm messages at this point will not affect the self-stabilization status 
of the system for the remainder of the current self-stabilization period. ♦ 


6. Overhead of the Protocol 

Since only two self-stabilization messages, namely Resync and Affirm messages, are 
required for the proper operation of this protocol, a single bit suffices to represent both messages. 
Therefore, for a data message w bits wide, the self-stabilization overhead will be 1 lw per 
transmission. 

The continual aspect of the proposed protocol requires reaffirmation of self-stabilization 
status of good nodes by periodic transmission of Affirm messages at Aaa intervals. As a result, 
the maximum number of self-stabilization messages transmitted within any time interval is 
deterministic and is a function of that time interval. In particular, a good node transmits at most 
P Effective/ 4-u self-stabilization messages during a period of P Effective, 

where, 

P Effective = time difference between any two consecutive resets of the LocalJTimer 

P Effective — P M + 6 F . 

Therefore, 

Number of messages sent by a node = P Effective / A A a 


and 

Total number of messages sent by K nodes = K P Effective / Aaa- 


7. Achieving Tighter Precision 

Since the self-stabilization messages are communicated at Aaa intervals, if Aaa, and hence 
Apredsion, are larger than the desired precision, the system is said to be Coarsely Synchronized. 
Otherwise, the system is said to be Finely Synchronized. If the granularity provided by the self- 
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stabilization precision is coarser than desired, a higher synchronization precision can be achieved 
in a two step process. First, a system from any initial state has to be Coarsely Synchronized and 
guaranteed that the system remains Coarsely Synchronized and operates within a known 
precision, Ap rec i sion . The second step, in conjunction with the Coarse Synchronization protocol, is 
to utilize a proven protocol that is based on the initial synchrony assumptions to achieve 
optimum precision of the synchronized system as depicted in Figure 12. 



Figure 12. The interplay of Coarse and Fine level protocols. 

As depicted in Figure 12, the Coarse Synchronization protocol initiates the start of the 
Fine Synchronization protocol if a tighter precision of the system is desired. The Coarse 
protocol maintains self-stabilization of the system while the Fine Synchronization protocol 
increases the precision of the system. 


8. Simulations and Model Checking 

Several approaches were taken toward verification of this protocol. The first is a VHSIC 
Hardware Description Language (VHDL) 2 simulation model that confirms the proper operations 
of the protocol for specific cases. The VHDL environment is primarily for simulation of specific 
scenarios where examination of the known cases requires proper set up of the system for each 
case separately. As the number of cases to be examined increases, this process becomes 
impractical. Therefore, symbolic model checkers are used which can examine all possible 


2 Very High Speed Integrated Circuit (VHSIC) Hardware Description Language. 
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scenarios. The Symbolic Model Verifier (SMV) was used for the second modeling of this 
protocol. It was executed on a PC with 4GB of memory running Linux. 

The topology considered is a system of 4 nodes, as shown in Figure 13, such that all 
nodes can directly communicate with all other nodes, where K = 4, G = 3 and F = 1. With D = 1 
and d = 0, and A\a = D+d = 1, the number of states needed to represent all possible combinations 
of initial states for the entire 4-node system is 4.26xl0 46 states. However, with proper 
abstractions and employing a number of reduction techniques the state-space is reduced to 
5.13xl0 24 states. SMV is able to handle all possible scenarios and the protocol is exhaustively 
model checked. 



Figure 13. A 4-node fully-connected graph. 

This verification effort was conducted to mechanically verify the claims of the protocol. 
Verification of self- stabilizing a system of 4 nodes in the presence of a Byzantine fault may 
deceptively seem trivial, but to the best of the author’s knowledge, no other self-stabilization 
protocols has ever been mechanically verified to accomplish this goal. The amount of memory 
needed for the construction of the Binary Decision Diagram (BDD) readily reached the 4GB 
available on the PC after construction of the state-space. Therefore, model checking of larger 
and more complex systems poses a greater challenge. A detailed description of the model- 
checking efforts for this 4-node system will be the subject of subsequent reports. 


9. Applications 

The proposed self-stabilizing protocol is expected to have many practical applications as 
well as many theoretical implications. Embedded systems, distributed process control, 
synchronization, inherent fault tolerance which also includes Byzantine agreement, computer 
networks, the Internet, Internet applications, security, safety, automotive, aircraft, wired and 
wireless telecommunications, graph theoretic problems, leader election, time division multiple 
access (TDMA), and the SPIDER 3 4 project [Torres 2005] at NASA-LaRC are a few examples. 
These are some of the many areas of distributed systems that can use self-stabilization in order to 
design more robust distributed systems. 


3 http://www-2.cs.cmu.edu/~modelcheck/smv.html 

4 Scalable Processor-Independent Design for Enhanced Reliability (SPIDER). 
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10. Conclusions 


In this paper, a rapid Byzantine self- stabilizing clock synchronization protocol is 
presented that self-stabilizes from any state. It tolerates bursts of transient failures, and 
deterministically converges with a linear convergence time with respect to the self-stabilization 
period. Upon self- stabilization, all good clocks proceed synchronously. This protocol has been 
the subject of a rigorous verification effort. A 4-node system consisting of 3 good nodes and one 
Byzantine faulty node has been proven correct using model checking. 

The proposed protocol explores the timing and event driven facets of the self-stabilization 
problem. The protocol employs monitors to closely observe the activities of the nodes in the 
system. All timing measures of variables are based on the node’s local clock and thus no central 
clock or externally generated pulse is used. 

The proposed protocol is scalable with respect to the fundamental parameters, K, D, and 
d. The self-stabilization precision A Precision , A Lnca/ rime ft), and self-stabilization periods P T and P M 
are functions of K, D and d. The convergence time is a linear function of Pt and Pm and 
deterministic. As K increases so does the number of monitors instantiated in each node. Also, as 
K increases so does the number of communication channels in a system of fully connected 
communication network. Therefore, although there is no theoretical upper bound on the 
maximum values for the fundamental parameters, implementation of this protocol may introduce 
some practical limitations on the maximum value of these parameters and the choice of topology. 

A proof of this protocol has been presented in this report. The VHDL simulation and 
SMV model-checking efforts that verified the correctness of this self-stabilizing protocol are 
reported. This protocol is expected to be used as the fundamental mechanism in bringing and 
maintaining a system within bounded synchrony. 

Integration of a higher level mechanism with this protocol needs to be further studied. 
Furthermore, if a higher level secondary protocol is non-self-stabilizing, it is conjectured that it 
can be made self- stabilizing when used in conjunction with the protocol presented here. 

We have started formalizing the integration process of other protocols with this protocol 
in order to achieve tighter synchronization. We are also planning to implement this protocol in 
hardware and characterize it in a representative adverse environment. 
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Symbols 


P 

d 

D 

F 

G 

K 

K g 

Re sync 
Affirm 

R 

A 

T a 

T r 

Restore 

Maintain 

T 

M 

P T.min 

Pt 

Pm 

Aaa 

Arr 

c 

Alncul_limer( t ) 
Aprecision 

Aorift 

Ni 

Mi 


bounded drift rate with respect to real time 

network imprecision 

event-response delay 

sum of all faulty nodes 

sum of all good nodes 

sum of all nodes 

set of all good nodes 

self- stabilization message 

self- stabilization message 

abbreviation for Resync message 

abbreviation fox Affirm message 

threshold for Accept() function 

threshold for Retry ( ) function 

self- stabilization state 

self- stabilization state 

abbreviation for Restore state 

abbreviation for Maintain state 

minimum period while in the Restore state 

period while in the Restore state 

period while in the Maintain state 

time difference between the last consecutive Affirm messages 
time difference between the last consecutive Resync messages 
maximum convergence time 

maximum time difference of Local _Timers of any two good nodes at real time t 
maximum self-stabilization precision 
maximum deviation from the initial synchrony 
the i th node 

the i th monitor of a node 


- 28 - 



References 


[Daliot 2003 A] 

[Daliot 2003B] 

[Dijkstra 1974] 
[Dolev 1984] 

[Dolev 2004] 
[Driscoll 2003] 

[Kopetz 1997] 

[Lamport 1982] 

[Lamport 1985] 
[Malekpour 2006] 


Daliot, Ariel; Dolev, Danny; Parnas, Hanna: Self- Stabilizing Pulse 
Synchronization Inspired by Biological Pacemaker Networks, Proceedings 
of the Sixth Symposium on Self- Stabilizing Systems, DSN SSS '03, San 
Francisco, June 2003. 

Daliot, Ariel; Dolev, Danny; Parnas, Hanna: Linear Time Byzantine Self- 
Stabilizing Clock Synchronization, Proceedings of 7th International 
Conference on Principles of Distributed Systems (OPODIS-2003), La 
Martinique, France, December 2003. 

Dijkstra, E.W.: Self stabilizing systems in spite of distributed control, 
Commun. ACM 17,643-644m 1974. 

Dolev, Danny; Halpern, J.Y.; Strong, R.: On the Possibility and 
Impossibility of Achieving Clock Synchronization. In proceedings of the 
16 th Annual ACM STOC (Washington D.C., Apr.). ACM, New York, 
1984, pp. 504-511. (Also appear in J. Comput. Syst. Sci.) 

Dolev, Sholmi; Welch, Jennifer L.: Self- Stabilizing Clock 

Synchronization in the Presence of Byzantine Faults. Journal of the ACM, 
Vol.51, Np. 5, September 2004, pp. 780-799. 

Driscoll, Kevin; Hall, Brendan; Sivencronam, Hakan; Zumsteg, Phil: 
Byzantine Fault Tolerance, from Theory to Reality: Computer Safety, 
Reliability, and Security, Publisher: Springer- Verlag Heidelberg, ISBN: 3- 
540-20126-2, Volume 2788 / 2003, October 2003, pp. 235 - 248 

Kopetz, H: Real-Time Systems, Design Principles for Distributed 
Embedded Applications, Kluwar Academic Publishers, ISBN 0-7923- 
9894-7, 1997. 

Lamport, Leslie; Shostak, Robert.; Pease, Marshall: The Byzantine 
General Problem, ACM Transactions on Programming Languages and 
Systems, 4(3), pp. 382-401, July 1982. 

Lamport, L; Melliar-Smith, P. M.: Synchronizing clocks in the presence of 
faults, J. ACM, vol. 32, no. 1, pp. 52-78, 1985. 

Malekpour, M. R.; Siminiceanu, R.: Comments on the “Byzantine Self- 
Stabilizing Pulse Synchronization” Protocol: Counterexamples. 

NAS A/TM-2006-2 13951, Lebruary 2006, pp. 7. 


- 29 - 



[Srikanth 1985] 


[Torres 2005] 


[Welch 1988] 


Srikanth, T. K.; Toueg, S.: Optimal Clock Synchronization. Proceedings 
of the Fourth Annual ACM Symposium on Principles of Distributed 
Computing, 1985, pp. 71-86. 

Torres-Pomales, W; Malekpour, M.; Miner, P. S.: ROBUS-2: A fault- 
tolerant broadcast communication system. NASA/TM-2005-213540, 
March 2005, pp. 201. 

Welch, Jennifer L.; Lynch, Nancy: A New Fault-Tolerant Algorithm for 
Clock Synchronization. Information and Computation volume 77, number 
1, April 1988, pp.1-36. 


- 30 - 



REPORT DOCUMENTATION PAGE 


Form Approved 
OMB No. 0704-0188 


I 

The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, 
gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this 
collection of information, including suggestions for reducing this burden, to Department of Defense, Washington Headquarters Services, Directorate for Information Operations and 
Reports (0704-0188), 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person 
shall be subject to any penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 

PLEASE DO NOT RETURN YOUR FORM TO THE ABOVE ADDRESS. 


1. REPORT DATE (DD-MM-YYYY) 2. REPORT TYPE 3. DATES COVERED (From - To) 

01- 08 -2006 Technical Memorandum 

4. TITLE AND SUBTITLE 

A Byzantine-Fault Tolerant Self-Stabilizing Protocol for Distributed Clock 
Synchronization Systems 

5a. CONTRACT NUMBER 

5b. GRANT NUMBER 

5c. PROGRAM ELEMENT NUMBER 

6. AUTHOR(S) 

Malekpour, Mahyar R. 

5d. PROJECT NUMBER 

5e. TASK NUMBER 

5f. WORK UNIT NUMBER 

457280.02.07.07 

7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 

NASA Langley Research Center 
Hampton, VA 23681-2199 

8. PERFORMING ORGANIZATION 
REPORT NUMBER 

L-19262 

9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 

National Aeronautics and Space Administration 
Washington, DC 20546-0001 

10. SPONSOR/MONITOR'S ACRONYM(S) 

NASA 

11. SPONSOR/MONITOR'S REPORT 
NUMBER(S) 

NASA/TM-2006-2 14322 


12. DISTRIBUTION/AVAILABILITY STATEMENT 

Unclassified - Unlimited 
Subject Category 62 

Availability: NASA CASI (301) 621-0390 

13. SUPPLEMENTARY NOTES 

An electronic version can be found at http://ntrs.nasa.gov 


14. ABSTRACT 

Embedded distributed systems have become an integral part of safety-critical computing applications, necessitating system 
designs that incorporate fault tolerant clock synchronization in order to achieve ultra-reliable assurance levels. Many efficient 
clock synchronization protocols do not, however, address Byzantine failures, and most protocols that do tolerate Byzantine failures do not 
self-stabilize. Of the Byzantine self-stabilizing clock synchronization algorithms that exist in the literature, they are based on either 
unjustifiably strong assumptions about initial synchrony of the nodes or on the existence of a common pulse at the nodes. The Byzantine 
self-stabilizing clock synchronization protocol presented here does not rely on any assumptions about the initial state of the clocks. 
Furthermore, there is neither a central clock nor an externally generated pulse system. The proposed protocol converges deterministically, is 
scalable, and self-stahilizes in a short amount of time. The convergence time is linear with respect to the self-stabilization period. Proofs of 
the correctness of the protocol as well as the results of formal verification efforts are reported. 

15. SUBJECT TERMS 

Byzantine; Fault tolerant; Self-stabilization; Clock synchronization; Distributed; Protocol; Algorithm; Model checking; 
Formal proof; Verification 


16. SECURITY CLASSIFICATION OF: 

17. LIMITATION OF 
ABSTRACT 

18. NUMBER 
OF 

19a. NAME OF RESPONSIBLE PERSON 

a. REPORT 

b. ABSTRACT 

c. THIS PAGE 

PAGES 

STI Help Desk (email: help@sti.nasa.gov) 






19b. TELEPHONE NUMBER (Include area code) 

u 

u 

u 

uu 

37 

(301) 621-0390 


Standard Form 298 (Rev. 8-98) 

Prescribed by ANSI Std. Z39.18 



































